Recently, An article was submitted to introduce Smaji CJKV. Several review comments were received suggesting appending some citations and prerequisite information to the article. These comments make sense. After all, the development of most disciplines and engineering is a continuous progression. Mostly, new development is built on the foundation of their predecessors.

Although more information was in demand, initially, I found it a bit difficult to append information. Because in the past two decades, Recording, encoding, and font designing of variant or rarely-used characters, all these techniques had been researched and developed but haven’t created much influence. Most of them are independent private systems that cannot be integrated into general systems. Some of them are relatively open but only open in user interface level, while others, relatively open and standard, are incompetent in infrastructure and inadequate in serving as the basis for subsequent development. These systems are not worth mentioning in references or prerequisite knowledge.

In 1999, Unicode’s own Ideographic Description Characters were introduced in Unicode 3.0. The sequence of that characters is called "Ideographic Description Sequence", i.e. IDS. It is naturally integrated into the daily-used general systems based on Unicode, has huge user base and is easy to use. For example, the word "时间" can be expressed as "⿰日寸" and "⿵门日" respectively. Even a character as complex as "𰻝" as seen in word "𰻝𰻝面", can also be expressed as "⿺辶⿳穴⿲月⿱⿲幺言幺⿲长马长刂心". At first glance, the functionalities are complete.

Click here to view 𰻝
30edd

But the problem is that when it comes to "丝", IDS cannot decompose it. Because Unicode does not include the character which looks like "幺" minus the last dot. Another example is "乔", with "夭" above and "?" below, which is also an uncollected character. Another example is the decomposition of the following characters: "与","乌","亇","争","亥","以"…​

Because too many "components" or "roots" actually do not have characters corresponded with, and this system requires that their definition domains and value domains are all Unicode collected characters. Therefore, this system design was incomplete from the beginning: common or even daily-used characters may be out of the scope of describable.

Other private systems, which are aware of this problem, relaxed the restrictions on the definition domain and introduced private components. However, the composition of Chinese characters or components is diverse, and IDS and similar systems can only describe some ideal composition. A slightly less ideal one, such as "⿻", which means that two components overlap, is ambiguous. How exactly do they overlap, what is the direction of overlap, and what is the degree of overlap? No description at all. Thus, the glyph cannot be restored from the IDS. The result is yet another set of broken and incomplete systems.

However, the review comments also prompted me to think again, whether the efforts and legacies of the past are still valuable, or can they still be useful after being transformed and refined?

A general summary of the flaw of past explorations are listed as follows:

  1. The domain of composite component is limited

  2. IDS lacks accuracy

  3. Being not universal or narrow in application scenarios

The solution is designed accordingly:

  1. The domain of composite component is limited

    The first step is to lift this restriction, and in a way that does not create new problems. Therefore, the following conditions must be met:

    1. The domain is not only limited to Unicode included characters. Because of its incompleteness.

    2. The defined base components must be able to composite any characters. Otherwise, it becomes another incomplete system.

    3. Basic base components may not be added, deleted, or modified arbitrarily. To avoid causing failure and instability of the composition method.

    Given these three requirements, it is expected that basic strokes are the ideal choice that meets all the above requirements. But what we need are not roughly the so-called basic five types of strokes, we need to enumerate at least 63 basic strokes, as well as mirror(left-right, up-down) and rotation operators. Because there are mirror characters and inverted characters in Chinese characters.

  2. IDS lacks accuracy

    The structure described by IDS conform to some patterns, that is, the components described are vertically centered (⿱, ⿳) or horizontally centered (⿰, ⿲) or fully wrapped (⿴) or three-sides-surrounded (⿵,⿶,⿷,⿼) or two-adjacent-sides-surrounded (⿸,⿹,⿺,⿽). The described components operated by these descriptors all form new shapes. For center-separated components, we only need to calculate the length or width and take the average, and each component can adjust the aspect ratio based on the average to obtain a new shape. If the structure is surrounded, the inter components are best-fitted to and scaled down a bit according to the outer component.

    The descriptor ⿻ represents the description that two operands overlap with each other, which breaks the frame. Therefore, the shape of the components cannot be used as the basis for calculation in component arrangement. Besides, the descriptors(IDS) and operating components(strokes or roots) does not have any other intrinsical calculation basis, which leads to the inability of this description system.

    Therefore, we have to introduce additional information to fill in the gaps. The shape of the components described by the separation or enclosing descriptor are preserved, so are the combination of the described components, and the outer frame box of the combined components is their outer frame. There are several kinds of data: the size and position of the outer frame, and the size and position of the components after being embedded in the outer frame. So finally we can get the position and size information of the components with the outer frame as the origin of the coordinate system.

    After the descriptor ⿻ disables the component shape, the corresponding outer frame calculation cannot be performed, nor can the position and size information of the components. So, what we need to supplement is these two kinds of information with which the outer frame information can also be derived from the best-fit frame box.

    To describe plane position and size information, we need to introduce a plane coordinate system.

    The description of plane coordinates is a topic worth expanding on, and we will discuss it later. Now, let’s take a look at defect 3.

  3. Being not universal or narrow in application scenarios

    Unicode Character Set is required to be a standard information interchange set, so character components or roots must be selected from its own dataset. The basic components included in its own dataset has not covered the necessary essential components. Besides, the description capability of Unicode’s own Ideographic Description Characters (IDC) is incomplete. This resulted in defects 1, 2.

    However, universities, technical groups, and commercial organizations other than Unicode Consortium had also tried to design or implement systems that are both Unicode compatible and of description capability complete. Most of them are close to be complete, and some are Unicode incompatible, few are perfect, thus limiting their application scenarios.

    Another important reason is that the requirements for flexibility and real-time are difficult to reach. For example, a scholar once needs to quote excerpts from an ancient book, but in which several of the texts have multiple variations and are not included in the standard. Or an ancient book has been newly unearthed, and some characters that have not been seen before appear. It needs to be introduced into the standard and our computer system must be updated so that the characters can be encoded and displayed properly.

    The above requirements require a long and possibly failing Unicode routine, which definitely will affect the progress of article writing.

    The solution to this defect is given in Smaji CJKV, so I won’t go into details.

In fact, Smaji CJKV did not have a plan to design a glyph description system at the beginning. Only bitmap or vector images are allowed to be submitted. It became possible to design the describe system when the core system was set up and keep compatible with the Unicode system. The reviewers' suggestion for supplementary information mentioned earlier made me rethink my past experience, and then the design the glyph description language was started.

Well, let’s solve the problems skipped before:

  1. IDS lacks accuracy

The idea and method to solve this problem require more space to describe, so the following subsection is added.

Glyph Outline Description Language

Because the standard form of this language is xml document, an XML Schema Definition is most suitable to describe it. The following is the very syntax description document god.xsd of this language.

Create XML document

An XML document consists of an optional XML declaration, an optional document type declaration, and a document (root) element.

The version declaration of an xml ensures that future XML changes will not affect the syntax and semantics of this document. The encoding declaration tells the XML processor the encoding used by this document. The XML version used by the GOD 1.0 document is 1.0, and the encoding is UTF-8. So its XML The encoding header is certain:

<?xml version="1.0" encoding="UTF-8"?>

Because the xml version defaults to 1.0, and the default available encoding can be UTF-8 or UTF-16, the declaration header above is not necessary.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
<?xml version="1.0"?>
<god version="1.0"
  xmlns="http://cjkv.smaji.org/ns/god"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://cjkv.smaji.org/ns/god http://cjkv.smaji.org/xml/1.0/xsd/god.xsd">
  <glyph unicode="516b,0">
    <stroke type="t" x="0" y="0" width="56" height="112"/>
    <stroke type="p" x="76" y="0" width="56" height="112"/>
  </glyph>
</god>

The first line is an optional XML declaration.
Lines 2 and 10 start and end a god root element. The root element is mainly used to indicate the version of this god document. The version attribute in the second line indicates that this god document adopts the syntax and semantics of version 1.0.
The fourth and fifth lines are optional and are used to introduce the XSD description of this god document so that capable text editors can use it to verify the correctness of the god document being edited and provide suggestions such as auto-completion.

The next child element is glyph. It contains a required attribute unicode, used to indicate the unicode scalar of the glyph described in this god document. Its value is a hexadecimal number representing a unicode scalar, and after the number, a value called variation selector can be appended separating by a comma. In the example, the value of the unicode property is 516b, which is the unicode scalar of the Chinese character 「八」.

「八」 consists of two strokes, the first stroke is a throw (撇), and the second stroke is a press (捺), so in the glyph element, we add two sub-elements, namely stroke t (撇) and stroke p (捺). And in the coordinate system, the position, width, and length information of each stroke is given. For more information of the stroke type in god. Please consult the god.xsd file.

The following table is an excerpt from god.xsd for reference.

Click here to view an excerpt from the god.xsd
h     | Horizontal
sh    | Slanted Horizontal
u     | Upward horizontal
du    | Dot – Upward horizontal
v     | Vertical
sv    | Slanted Vertical
rsv   | Right Slanted Vertical
t     | Throw
ft    | Flat Throw
wt    | Wilted Throw
d     | Dot
ed    | Extended Dot
ld    | Left Dot
wd    | Wilted Dot
p     | Press
up    | Upward horizontal – Press
hp    | Horizontal – Press
fp    | Flat Press
ufp   | Upward horizontal – Flat Press
c     | Clockwise curve
a     | Anticlockwise curve
o     | Oval
hj    | Horizontal – J hook
uj    | Upward horizontal – J hook
ht    | Horizontal – Throw
hsv   | Horizontal – Slanted Vertical
hv    | Horizontal – Vertical
hvj   | Horizontal – Vertical – J hook
htj   | Horizontal – Throw – J hook
utj   | Upward horizontal – Throw – J hook
hvh   | Horizontal – Vertical – Horizontal
hvu   | Horizontal – Vertical – Upward horizontal
ha    | Horizontal – Anticlockwise curve
haj   | Horizontal – Anticlockwise curve – J hook
hpj   | Horizontal – Press – J hook
htaj  | Horizontal – Throw – Anticlockwise curve – J hook
htc   | Horizontal – Throw – Clockwise curve
htht  | Horizontal – Throw – Horizontal – Throw
htcj  | Horizontal – Throw – Clockwise curve – J hook
hvhv  | Horizontal – Vertical – Horizontal – Vertical
hthtj | Horizontal – Throw – Horizontal – Throw – J hook
vu    | Vertical – Upward horizontal
vh    | Vertical – Horizontal
va    | Vertical – Anticlockwise curve
vaj   | Vertical – Anticlockwise curve – J hook
vhv   | Vertical – Horizontal – Vertical
vht   | Vertical – Horizontal – Throw
vhtj  | Vertical – Horizontal – Throw – J hook
vj    | Vertical – J hook
vc    | Vertical – Clockwise curve
vcj   | Vertical – Clockwise curve – J hook
tu    | Throw – Upward horizontal
th    | Throw – Horizontal
td    | Throw – Dot
wtd   | Wilted Throw – Dot
tht   | Throw – Horizontal – Throw
thtj  | Throw – Horizontal – Throw – J hook
tj    | Throw – J hook
cj    | Clockwise curve – J hook
fpj   | Flat Press – J hook
pj    | Press – J hook
thtaj | Throw – Horizontal – Throw – Anticlockwise curve – J hook
tod   | Throw – Oval – Dot
Click here to view the corresponding graphics
Table 1. Inherited names of CJK basic and compound strokes (63 items)
Stroke Chinese name Abbr form Full name Name in Unicode Example

Cjk m str h

H

Horizontal

H

三 言 隹 花

Cjk m str sh

斜橫

SH

Slanted Horizontal

(H)

七 弋 宅 戈

Cjk m str u

U

Upward horizontal

T

刁 求 虫 地

Cjk m str du

點挑

DU

Dot – Upward horizontal

(T)

冰 冷 汗 汁

Cjk m str v

V

Vertical

S

十 圭 川 仆

Cjk m str sv

斜豎

SV

Slanted Vertical

(S)

丑 五 亙 貫

Cjk m str rsv

右斜豎

RSV

Right Slanted Vertical

(S)

𠙴

Cjk m str t

T

Throw

P

竹 大 乂 勿

Cjk m str ft

扁撇

FT

Flat Throw

(P)

千 乏 禾 斤

Cjk m str wt

直撇

WT

Wilted Throw

SP

九 厄 月 几

Cjk m str d

D

Dot

D

主 卜 夕 凡

Cjk m str ed

長點

ED

Extended Dot

(D)

囪 囟 这 凶

Cjk m str ld

左點

LD

Left Dot

(D)

心 忙 恭 烹

Cjk m str wd

直點

WD

Wilted Dot

(D)

六 文 宇 空

Cjk m str p

P

Press

N

人 木 尺 冬

Cjk m str up

挑捺

UP

Upward horizontal – Press

TN

文 廴 父 爻

Cjk m str hp

橫捺

HP

Horizontal – Press

(TN)

入 八 內 全

Cjk m str fp

扁捺

FP

Flat Press

(N)

走 足 廴 麵

Cjk m str ufp

挑扁捺

UFP

Upward horizontal – Flat Press

(TN)

之 乏 巡 迴

Cjk m str c

C

Clockwise curve

W

Cjk m str a

A

Anticlockwise curve

X

Cjk m str o

O

Oval

Q

〇 㔔 㪳 㫈

Cjk m str hj

橫鈎

HJ

Horizontal – J hook

HG

冧 欠 冝 蛋

Cjk m str uj

挑鈎

UJ

Upward horizontal – J hook

(HG)

也 乜 池 馳

Cjk m str ht

橫撇

HT

Horizontal – Throw

HP

夕 水 登 令

Cjk m str hsv

橫斜

HSV

Horizontal – Slanted Vertical

(HP)

今 彔 互 恆

Cjk m str hv

橫豎

HV

Horizontal – Vertical

HZ

口 己 臼 典

Cjk m str hvj

橫豎鈎

HVJ

Horizontal – Vertical – J hook

HZG

而 永 印 令

Cjk m str htj

橫撇鈎

HTJ

Horizontal – Throw – J hook

(HZG)

勺 方 力 母

Cjk m str utj

挑撇鈎

UTJ

Upward horizontal – Throw – J hook

(HZG)

也 乜 池 馳

Cjk m str hvh

橫豎橫

HVH

Horizontal – Vertical – Horizontal

HZZ

凹 兕 卍 雋

Cjk m str hvu

橫豎挑

HVU

Horizontal – Vertical – Upward horizontal

HZT

殼 鸠 说 计

Cjk m str ha

橫曲

HA

Horizontal – Anticlockwise curve

HZW

朵 沿 殳 没

Cjk m str haj

橫曲鈎

HAJ

Horizontal – Anticlockwise curve – J hook

HZWG

九 几 凡 亢

Cjk m str hpj

橫捺鈎

HPJ

Horizontal – Press – J hook

(HZWG)

風 迅 飛 凰

Cjk m str htaj

橫撇曲鈎

HTAJ

Horizontal – Throw – Anticlockwise curve – J hook

HXWG

乙 氹 乞 乭

Cjk m str htc

橫撇彎

HTC

Horizontal – Throw – Clockwise curve

---

過 过 這 这

Cjk m str htht

橫撇橫撇

HTHT

Horizontal – Throw – Horizontal – Throw

HZZP

延 建 巡 及

Cjk m str htcj

橫撇彎鈎

HTCJ

Horizontal – Throw – Clockwise curve – J hook

HPWG

陳 陌 那 耶

Cjk m str hvhv

橫豎橫豎

HVHV

Horizontal – Vertical – Horizontal – Vertical

HZZZ

凸 𡸭 𠱂 𢫋

Cjk m str hthtj

橫撇橫撇鈎

HTHTJ

Horizontal – Throw – Horizontal – Throw – J hook

HZZZG

乃 孕 仍 盈

Cjk m str vu

豎挑

VU

Vertical – Upward horizontal

ST

卬 氏 衣 比

Cjk m str vh

豎橫

VH

Vertical – Horizontal

SZ

山 世 匡 直

Cjk m str va

豎曲

VA

Vertical – Anticlockwise curve

SW

區 亡 四 匹

Cjk m str vaj

豎曲鈎

VAJ

Vertical – Anticlockwise curve – J hook

SWG

孔 已 亂 也

Cjk m str vhv

豎橫豎

VHV

Vertical – Horizontal – Vertical

SZZ

鼎 亞 吳 卐

Cjk m str vht

豎橫撇

VHT

Vertical – Horizontal – Throw

(SZZ)

奊 捑 𠱐 𧦮

Cjk m str vhtj

豎橫撇鈎

VHTJ

Vertical – Horizontal – Throw – J hook

SZWG

弓 弟 丐 弱

Cjk m str vj

豎鈎

VJ

Vertical – J hook

SG

小 水 到 寸

Cjk m str vc

豎彎

VC

Vertical – Clockwise curve

SWZ

肅 嘯 蕭 瀟

Cjk m str vcj

豎彎鈎

VCJ

Vertical – Clockwise curve – J hook

---

𨙨 𨛜 𨞠 𨞰

Cjk m str tu

撇挑

TU

Throw – Upward horizontal

PZ

去 公 玄 鄉

Cjk m str th

撇橫

TH

Throw – Horizontal

(SZ)

互 母 牙 车

Cjk m str td

撇點

TD

Throw – Dot

PD

巡 兪 巢 粼

Cjk m str wtd

直撇點

WTD

Wilted Throw – Dot

(PD)

女 如 姦 㜢

Cjk m str tht

撇橫撇

THT

Throw – Horizontal – Throw

(SZZ)

夨 𠨮 专 砖

Cjk m str thtj

撇橫撇鈎

THTJ

Throw – Horizontal – Throw – J hook

(SZWG)

巧 亟 污 號

Cjk m str tj

撇鈎

TJ

Throw – J hook

PG

Cjk m str cj

彎鈎

CJ

Clockwise curve – J hook

WG

狗 豸 豕 象

Cjk m str fpj

扁捺鈎

FPJ

Flat Press – J hook

BXG

心 必 沁 厯

Cjk m str pj

捺鈎

PJ

Press – J hook

XG

弋 戈 我 銭

Cjk m str thtaj

撇橫撇曲鈎

THTAJ

Throw – Horizontal – Throw – Anticlockwise curve – J hook

---

𠃉 𦲳 𦴱 鳦

Cjk m str tod

撇圈點

TOD

Throw – Oval – Dot

---

𡧑 𡆢

After being processed by the glyph outline generation program provided by Smaji CJKV, the following outline file is generated, which can be used in a font editor.

the outline of 516b

In god, strokes are used to form glyphs, so are the existing characters. For example, the character "丕" can be composed of the character "不" plus "一".

1
2
3
4
5
6
7
<?xml version="1.0"?>
<god version="1.0" xmlns="http://cjkv.smaji.org/ns/god">
  <glyph unicode="4e15,0">
    <ref unicode= "4e0d" x="0" y="0" width="128" height="120"/>
    <stroke type="h" x="0" y="114" width="128" height="14"/>
  </glyph>
</god>

Of course, although using unicode scalar directly is accurate, typing in a character instead is also a very good choice for commonly used and unambiguous characters. The god file above can also be rewritten into the following form. Change line 4 to

<character utf8= "不" x="0" y="0" width="128" height="120"/>

Get the following god file

1
2
3
4
5
6
7
<?xml version="1.0"?>
<god version="1.0" xmlns="http://cjkv.smaji.org/ns/god">
  <glyph unicode="4e15,0">
    <character utf8= "不" x="0" y="0" width="128" height="120"/>
    <stroke type="h" x="0" y="114" width="128" height="14"/>
  </glyph>
</god>

The following glyph outlines can be produced:

the outline of 4e15

Let’s take a look at another glyph outline:

the outline of 2010f

Doesn’t it look like "了" turned upside down? Indeed, in Chinese characters, there are left-right mirror characters, up-down mirror characters, and rotated characters. The character illustrated is a rotating one. So how does it described in god?

1
2
3
4
5
6
<?xml version="1.0"?>
<god version="1.0" xmlns="http://cjkv.smaji.org/ns/god">
  <glyph unicode="2010f,0" transform="rotate180">
    <character utf8="了" x="0" y="0" width="88" height="128" />
  </glyph>
</god>

One of the design concepts in god is that for Chinese characters after Liding(隶定) and Libian(隶变), their composition is a combination of basic components and strokes, rather than the manipulation of basic components and strokes. Therefore, mirroring or rotating operations only work on the characters as a whole.

Therefore, we can add transform attribute to the glyph element and

  • mirror_horizontal

  • mirror_vertical

  • rotate180

are given to choose from as the attribute’s value to indicate the transition.

Because the glyph of unicode 2010f is exactly the rotation of the character "了". So in this god file, the 6th line indicates that the transform attribute is rotate180, and the 7th line directly introduces the glyph of the character "了" as the basis. That is, the required glyph is obtained.

Smaji CJVK support for GOD

Smaji Glyph Outline

An OCaml library for reading, exporting, and converting glyph outline data and files.

Supported glyph outline formats are:

  • SVG, Scalable Vector Graphics. It is extremely widely used and supports an unusually rich range of vector graphics formats.

  • GLIF, Glyph Interchange Format. for Unified Font Object

Smaji God

An OCaml library for reading, processing, and exporting GOD documents.

Smaji DynGlyph

An executable program that allows users to generate font outline files from GOD documents, and the outline files can be used to generate fonts. In addition, users can also use this program to generate stroke animation files for demonstration.

Smaji DynGlyph Collection

A git repository that stores sample basic stroke libraries used by the dyn-glyph program, as well as a collection of GOD documents submitted by users.

Online God Editor

Edit online, submit god files, and generate svg outline files or animation files.

- ZAN DoYe


Comments

comments powered by Disqus

© 2024 ZAN DoYe